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SR'™'  OF  HIGH  DIMENSIONAL  GRAMMARS* 


K.  S.  Fu 

t 

School  of  Electrical  Engineering 
Purdue  Vnivenity 
W.  Lafayette,  Indiana  47907 
U.SiA. 

ABSTRACT 

Inference  of  high-dimensional  grammars  such  as  tree  grammars  and 
web  grammars  is  discussed.  The  k-tail  inference  procedure  for  finite-state 
grammars  is  extended  to  the  case  of  regular  tree  grammars.  The  behavior  of 
the  k-t^il  procedure  with  variable  values  of  k is  studied.  The  derivation 
diagram  of  context-free  web  languages  is  introduced.  A “semantic  teacher” 
is  used  for  the  inference  of  web  grammars.  Application  examples  in  picture 
and  scene  analysis  are  presented. 
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INTRODUCTION 

The  use  of  formal  linguistics  in  modeling 
natural  and  programming  languages  and  describing 
physical  patterns  and  data  structures  has  recently 
received  increasing  attention.  Grammar  or  syntax 
rules  are  employed  to  describe  the  syntax  of  languages 
or  the  structural  relations  of  patterns  or  data.  In 
order  to  model  a language  or  to  describe  a clan  of 
patterns  or  data  structures  under  study  more  realis- 
tically, it  is  hoped  that  the  grammar  used  can  be 
directly  inferred  from  a set  of  sample  sentences  or  a 
set  of  sample  patterns  (or  data).  Grammatical 
inference  is  the  problem  of  learning  a grammar  based 
on  a set  of  sample  sentences.  Potential  appli- 
cations of  grammatical  inference  incMe  areas  of 
pattern  recognition,  information  retrieval,  pro- 
gramming language  design,  translation  and  compiling, 
graphics  languages,  man-machine  communication,  and 
artificial  intelligence. 


In  (1-3),  inference  of  nonstochastic  and 
stochastic  string  grammars  was  surveyed  and  a 
heuristic  inference  procedure  for  tree  grammars  was 
proposed  in  (4).  In  this  paper,  the  k-tail  pre- 
sented. An  inference  procedure  for  transition 
network  grammars  was  proposed  in  (4).  In  this 
paper,  the  k-tail  inference  procedure  for  finite-state 
grammar  (5)  is  extended  to  the  case  of  regular  tree 
grammars.  The  behavior  of  the  k-tail  tree  grammar 
inference  method  for  varying  values  of  k is  studied. 
A web  grammar  interpretation  of  Winston’s  structure 
learning  is  discussed  and  an  inference  procedure  for 
context-free  web  grammars  is  suggested. 

K-TAIL  INFERENCE  METHOD  FOR  REGULAR 
TREE  GRAMMARS 

The  k-tail  inference  method  for  finite-state 
string  grammars  requires  an  integer  parameter  k as 
input  along  with  the  presentation  of  (positive) 
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training  samples  (5).  Sublanguages  Sw  are  created 
where 

Sw  - [ xj  w*  is  a string  in  the  (positive)  training 
set  and 
-|x|<k  J 

1 x | is  the  length  of  x.  Equivalent  Sw  sets  are  then 
combined  to  form  the  i^1  sublanguage.  A rule, 
Aj  -MAj  is  produced  if  there  is  a string  w such  that 
is  the  ith  sublanguage  and  Swt  is  the  jth  sub- 
language. The  rule  Aj  -*■  t is  produced  if  there  is  a 
string  w such  that  Sw  is  the  ith  sublanguage  and  wt 
is  in  the  training  sample.  For  strings,  the  exactness 
of  the  grammar  produced  for  any  given  training  set 
can  be  adjusted  by  varying  k from  0 up  to  the 
length  of  the  longest  string  in  the  training  set.  The 
inferred  languages  vary  correspondingly  from  some- 
thing close  to  the  universal  language  to  the  pre- 
sentation itself.  Thus,  any  method  restricted  to  k * 1 
will  infer  grammars  which  generate  languages  which 
are  very  “loose”  in  their  fit  of  the  sample  set. 

It  is  possible  to  extend  the  k-tail  method  for 
finite-state  string  grammars  to  regular  tree  grammars. 
The  method  is  as  follows: 

Step  1.  Form  the  following  collection: 

Ct*l(r,.ra,...rm)|tr,r,...Trn 

is  a tree  in  the  training  set 
and  1 rj|  < k for  8 = 1 ,2, ...  m ] 

where 

t is  a tree  with  a single  special  frontier  node. 

T,  ,rj, . . . rm  are  any  trees  that  can  occur  in 
positions  1 ,2, ...  m. 

tTjfj  ...  rm  is  the  tree  formed  by  concatenating 
To  at  the  £'"  position  of  the  special  frontier  node 
ol  t, 

I Tg  | is  the  depth  of  Tj  + 1 
m is  the  number  of  descendants  of  t and  is  not 
fixed  to  any  particular  integer. 

Note  that  t,  the  empty  tree,  is  possibly  a member  of 

Ct. 

Step  2. 

The  collection  Ct  of  tuples  of  trees  can  be 
partitioned  into  subcollections  of  m-tuples  where  m 
is  a fized  integer  for  all  elements  of  each  subcollec- 
tion. 

Ct-Ct0  U C»i  U...  UCtm 

where 

ctO  the  training  set,  otherwise 

CtO*  * 

C,  | *=  [(rt ) | tr,  is  a tree  in  the  training  set  and 
Ir,  Kkj 

C,2  * l(T, , Tj)  Itr,  Tj  it  a tree  in  the  training  set 

|T||  < k and  :|r3|  <k  ] (Note  here 
that  the  subscript  indicates  the 
position  a tree  occupies  and  that  rt  In  Ct* 

Jl.  V 1/ 


• is  not 

• necessarily  always  the  same  tree  nor  is  it 
the  same  tree  as  r,  in  Cj, .) 

^tm  * Kri  >Ta>  • • ■ rm)  I tTtTj  . . . rm 
is  a tree  in  the  training  set  and 

h'jl  < k for  £ ■ 1 ,2, . . . mj 

Thus,  Ct  is  a collection  of  tuples  of  trees  and  Ctj 
is  a collection  of  i-tuples  of  trees  where  i is  a Fixed, 
specified  integer. 

Each  of  these  collections  defines  all  of  the 
i-tuples  of  k-tail  trees  that  are  in  the  training  set  with 
root  attached  to  the  tree,  t,  at  its  special  frontier 
node.  The  collections  are  separated  in  this  way 
because  an  i-tuple  and  a j-tuple  where  i # j cannot 
be  generated  by  the  same  rule.  Thus,  we  will  now 
demonstrate  the  procedure  that  should  be  applied 
to  each  of  the  subcollections. 

Step  3. 

The  next  step  is  one  which  is  not  necessary 
in  the  case  of  strings.  It  is  necessary  here  because 
a node  can  have  several  descendants  and  it  may  be 
that  only  certain  ordered  combinations  of  des- 
cendants are  allowed.  Thus,  each  subcollection  of 
i-tuples  of  trees,  C,j,  must  be  further  divided  into 
subcollections  of  i-tuples,  each  of  which  can  be 
expressed  as  the  cartesian  product  of  i sets  of  trees. 
Thus,  Ctj  may  be  written: 

Cti=Cti.UCtiJU...UCtin 

where 

Ctjj  - l(r, Tj,  - - - n)  |r,  e Sj, , t2  e Sjj,  . . . ] 

" CUj  * l(Ti,Tj, . . .Tj)  l(r,,Tj,..  .TmleSj,  x 
Sjj  x • • • x Sjj ) 

That  is,  each  C,jj  is  characterized  by  i sets,  Sjj, 
(£  = 1 ,2,  . . . i),  of  trees  from  which  the  6th  member 
of  an  i-tuple  must  be  selected.  These  Sjj  sets  are 
sublanguages  of  trees  and  may  be  regarded  as  a set 
of  trees  generated  by  a particular  nonterminal  of 
the  tree  grammar.  The  difficult  part  of  this  step 
is  to  find  those  sets  Sjj  which  efficiently  characterize 
the  Ctjj.  First  of  all,  the  resulting  grouping  is  not 
unique.  One  possible  grouping  would  be  that  in 
which  each  Ctij  has  one  element.  This  would  not 
be  a good  choice  because  each  C,jj  will  result  in  a 
grammar  rule.  Thus,  this  choice  would  result  in  a 
large  number  of  rules.  Since  there  are  a finite  number 
of  elements  in  Ctj,  there  are  a finite  number  of 
groupings  and  each  of  these  can  be  tried.  It  is  not 
necessary  that  the  Ctjj  be  diqointed.  A particular 
grouping  would  be  optimum  if  it  introduced  a 
minimum  number  of  new  Sjj  sublanguages. 

Now  the  rules  for  the  grammar  can  be  cons- 
tructed. Equivalent  Sjj  sublanguages  are  combined 
and  a nonterminal  is  assigned  corresponding  to  each 
distinct  sublanguage.  Now  a rule  Ajj-*x  An,  Anj . . . 
A«m  1*  produced  if  there  is  a tree  t such  that: 

'V 


' 1.  Ajg  is  the  nonterminal  corresponding  to  the 

sublanguage  Sjg. 

2.  There  exists  a Cty  that  contains  the  sub- 
language Sjg  in  the  Cth  position  of  its 
specification. 

3.  tx  is  a tree  with  x concatenated  at  the  Cth 
position  of  t. 

4.  There  exists  a Ctxmn  which  is  specified  by 
the  sublanguages  Sn,,Sni> . . . Snm. 

5.  Am  ,An2, . . . Anm  are  the  nonterminals  cor- 
responding to  the  Sm  ,Sn2>. . . Snm 
sublanguages,  respectively. 

6.  Either  x is  a tree  in  Sjg  where  a e S„ , . 

As. 

0cSna>  • • •^fSnmor  \aP  rl^ 


A rule  Ajg  -*■  x is  produced  if  conditions  1 , 2 and  3 
above  are  satisfied  and  tx  is  in  the  training  set. 

To  illustrate  consider  the  following  example: 

Example  1: 

Consider  the  following  regular  tree  gramma;: 

(1)  S-*>S  (3)  B-+b 

A A 

® *7,  *A 

B B (4) 

(5)  B-*-b 


The  training  set  is  the  following: 


(3)  $ 

A 


b b b b » b 

A / 

(4)  S.  (5)  Sa  (6)  $ 

,A  A A 

A A A A 
A A 

w A (8)  A A 

/V  A b A 1 


A A - A 

b b b b a o 


Ab 

A 

A 


Now  assume  k » 1 and  construct  the  grammar  as 
follows: 

Step  1: 

(Note:  Greek  letters  are  used  here  to  specify 
the  distinct  trees  which  were  all  represented  by  t in 
the  explanation  above.) 


Let  a-e (the  empty  tree) 

Then  Ca=  [0] 

Let  0=  J 

Then  Cj}=  [(b,b)]  (from  sample  1) 

Let  7 = $ (the  underline  denotes  where  the 

A 

(Ti  . . . Tj)  are  concatenated) 

Then  Gy  = [ e,(b,b,)l  (from  samples  1 & 2,  respec- 
tively) 

Let  6=  $ 

A 

Then  Cj  = [e,  (b,b)](from  sample  1 & 3,  respectively) 
Let  p-  $ 

A 

, b b 

A 

b b 

Then  Cp  = [e,  (b,b)J  (from  saples  2 & 5) 

Let  n*  $ 

• A 
A 

Then  C-  = [e]  (from  sample  8) 

Let  0*  S 

A 


Then  Cg  * [(a,b)]  (from  sample  8) 
Let  X * $ 

/\ 

A b 


/\ 

A b 

Then  = [fj  (from  sample  8) 

Let  p * $ 

A 

A 

A 

Then  Cp  = [e,  (b,b))  (from  samples  8 1 9) 

Step  2: 

Co*  Cog  * 0 
C0«=C^j=  l(b,b)] 

Cym  CyQ  uCy2  where  C^q  * (e]  and  C72  = 
[(b,b)J 

c6  = c60  U £52 where  Cjq  = (ej  and  Cfi2  = 
[(b.b)) 

Cp  = Cp0  UCp2  where  « [a)  and  Cp2«= 
l(b.b)l 

S"c»io"W 


ce  = c02  = [(a,b» 

CM  = CM0  UCpi2  where  CM0  = lel  and  Cp2* 
l(b.b)) 


Step  3: 

CaO  = Ca01  = 0 

c|32  * c021  = KT»  *TJ>  I Ti  e B-  TJ  cB) 
where  B is  the  sublanguage  of  trees  * [b] 

C-yo  * C-JOI  “ l«) 

Cy2  = c02l“  Kt>  -ti>  |ti  e B*  T»  e 
cfiO  = C501  = 

c82  * c621  " Kti,t2)1t,  efl.Tj  e B] 

c80  = lel 

c82  = c821  = Kt>  ’t»)  |t>  e B-  ti  e ®1 

CTjO=le] 

c02  = c02I"Kt« >Ti)  ^ eA'T*e  ®) 

where  A is  the  sublanguage  of  trees  - [a] 

CX0  = le> 

CMo  = 

CH2  = C/i2l=  KT'  t^|t'  e B>  T»  e B1 
Now  the  nonterminals  and  their  equivalent  sub 
languages  are  enumerated 


Nonterminal 

S 

A 

B 

E 


Sublangutgee 

0 

[a] 

D>] 

[e] 


Now  the  grammar  rules  can  be  constructed: 
From  the  relation  0 = a $: 


S -*  $ 


1.  S is  the  nonterminal  corresponding  to  the 
sublanguage,  0.4 

2.  CgQj  has  the  sublanguage  0 concatenated 
at  its  1st  position,  (i.e.,  there  are  no  trees  of 
depth  0 in  the  training  set.) 

3.  0 3 aS  is  a tree  with  S concatenated  in  the 
1st  position. 

4.  Cq2\  i*  specified  by  the  sublanguages  [b] 
and  [b] , respectively. 

5.  B and  B are  the  nonterminals  corresponding 
to  (b)  and  (b) . 

From  the  relation  7 3 0b: 

‘7\ 

B B 

Note: 

1.  B is  the  nonterminal  corresponding  to  the 
sublanguage,  [b] . 

2.  Cq2\  has  [b]  in  the  1st  position  (at  depth  2). 

3.  7 * 0b  is  a tree  with  b concatenated  in  the 


1st  position. 

4.  C,y2i  is  specified  by  the  sublanguages  [b] 
and  [b] , respectively. 

5.  B and  B are  the  nonterminals  corresponding 
to  [b]  and  [b] . 

Also  B -*b  because 


A 


is  in  the  training  set. 

'\ 

b b 

Now  consider  the  relation  0 “ yab 


This  yields  the  rule: 


Because  1 . B is  the  nonterminal  corresponding  to  [b] . 

2.  Cy2i  has  [b]  in  its  1st  position. 

3.  6 = yab  is  a tree  with  b concatenated  in  its 
2nd  position. 

4.  Cg2i  is  specified  by  the  sublanguages  [a] 
and  [b] , respectively. 

5.  A and  B are  the  nonterminals  corres- 
ponding to  [a]  and  [b] , respectively. 

The  relation  X 3 0ab  yields 


A -*•  a 


because  X is  in  the  training  set. 

Now  notice  that  the  nonterminal  E does  not  appear 
on  the  left-hand  side  of  any  rule  and  can  be 
ignored.  This  is  because  it  corresponds  to  the 
sublanguages,  [e] , which  means  the  tree  has  ter- 
minated without  further  descendants. 

Further,  tests  with  subtrees  from  the  training 
set  will  show  that  all  the  rules  have  now  been  found. 
The  entire  production  set  is  shown  below: 


B -*  b 


B -*•  b 


A -*■  a 


B -*■  b 

Note  that  this  grammar  generates  all  of  the  samples 
in  the  training  set  and  in  fact  generates  a language 
larger  than  the  real  one.  For  example,  this  grammar 
would  generate  the  following  trees  which  are  not  in 
the  real  language: 


S $ 

/ \ /\ 

A A A 


Similarly,  for  k32,  we  have  the  inferred  production  set 


s -*■ 


B 


‘A 

A C 


B -*•  b 


A -*■  a 


C -*•  b 

A 

A B 

The  language  generated  by  this  grammar  is  exactly 
that  generated  by  the  true  grammar. 

For  k“5,  we  have  the  production  set 


'k 


S $ 

A 


E -*•  b 

A 

c c 


7\ 

B B 


E -»•  b 

A 

A D 


D -*■  b 

A 

A C 


C -*■  b E -*  b 

A -*■  a B -*■  b 

The  language  generated  by  this  grammar  is  exactly 
the  training  set. 

The  tree  grammar  inference  methods  of  Bhar- 
gava  and  Fu  (6)  and  Gonzalez  and  Thomason  (7) 
are  similar  in  that  they  both  assume  recursiveness 
whenever  there  is  the  slightest  evidence  of  it.  It  is 
in  this  sense  that  they  are  similar  to  the  k tail 
method  with  k*l.  In  the  k-tail  method,  when  k«l, 
the  “loosest”  nontrivial  grammar  is  produced.  In 
many  cases,  this  will  be  the  same  grammar  as  pro- 
duced by  both  methods.  The  k-tail  method  will 
produce  more  satisfactory  grammars  when  k > 1 
and  when  the  training  set  is  of  adequate  size. 


AN  INFERENCE  PROCEDURE  FOR  WEB 
GRAMMARS 


In  his  work  on  language  identification  in  the 
limit.  Gold  (8)  noted  the  importance  of  correctly 
ordering  the  information  sequence.  _ Most  other 
grammatical  inference  researchers  have  also  noted 
this  importance.  An  interesting  demonstration  of 
the  need  to  carefully  select  the  training  sequence  is 
the  work  by  Winston  (9).  The  purpose  of  the  work 
was  to  develop  a system  which  could  learn  structural 
descriptions  of  scenes  by  analyzing  specially  selected 
examples.  This  work  is  now  formalized  and  related 
to  the  grammar  inference  problem. 

The  basic  idea  will  be  to  correlate  the  deriva- 
tion diagram  of  a web  grammar  with  the  semantic 
net  used  by  Winston.  They  by  following  the  steps 
used  by  Winston  on  the  semantic  net  and  finding 
equivalent  steps  for  the  derivation  diagram,  the 
method  can  be  translated  into  web  grammar  ter- 
minology. The  result  will  be  a grammatical  inference 
procedure  for  web  grammars  which  can  be  applied 
more  generally  than  in  the  specific  block  world  con- 


sidered. A brief  review  of  the  derivation  diagram 
of  web  grammars  (10)  will  be  required  to  support 
this  discussion. 

1.  The  Derivation  Diagram  of  Context-Free  Web 

Grammars 

Study  of  the  context-free  class  of  web  languages 
reveals  that  many  of  the  formal  language  properties 
of  string  language  also  hold  for  the  corresponding 
web  languages.  One  example  is  the  existence  for 
context-free  web  grammars  of  a structure  similar 
to  a derivation  tree  for  context-free  string  grammars 
(10).  The  definition  of  this  structure,  called  a 
derivation  diagram  is  now  given  and  an  example  is 
given  in  Figure  1. 

A new,  unique  relation  called  the  direct  des- 
cendant relation  is  introduced.  For  a pair  of  nodes 
(nj,n2)  connected  by  this  relation,  n2  is  called 
the  direct  descendant  of  n,.  n,is  called  the  direct 
ancestor  of  n2.  A node  n^  is  called  a descendant 
of  nt  if  there  is  a sequence  n^.-n^  such  that  nj+j 
is  a direct  descendant  of  nj.  nt  is  called  an  ancestor 

ofnk. 


Definition  1: 

D,  a web,  is  a derivation  diagram  for  a context- 
free  web  grammar  G=(Vfs(,Vq-J>,S)  if: 

(1)  There  is  one  node  called  the  root  with  no 
ancestors  whose  label  is  S,  the  start 
symbol  of  G. 

(2)  All  other  nodes  have  exactly  one  direct 
ancestor  and  every  node  is  a descendant 
of  the  root. 

(3)  Every  node  has  a label  which  is  a symbol 
in  VN. 

(4)  If  a node  n has  at  least  one  descendant 
and  has  label  A,  then  A must  be  in  Vn- 

(5)  If  nodes  n!  ,n2, ....  n^  are  the  direct  des- 

cendants of  node  n with  labels  Aj  ,A2, . . . 
.Afc  respectively,  A -*■  0 must  be  a pro  - 
duction  of  Pof  G where  N/j=ni  ,n2, . . 
and  the  Aj  is  the  label  of  the  node  nj  in 
0,i=l k. 

(6)  nj  and  nj  are  connected  by  relation  r if 
and  only  if 

a)  one  is  the  direct  descendant  of  the 
other  and  r is  the  direct  descendent 
relation  or 

b)  nj  and  nj  are  both  direct  descendants 

of  A, A is  a rule  in  P 


ni 

a subweb  of  0 or 


c)  nj  and  some  node  n^  are  connected 
by  relation  r and  nj  is  the  direct 

descendant  of  "k  through  the  rule  A-*ft 
a rule  in  P and  the  r between  nj  and 

J results  from  the  embedding  mapping 
of  A. 


oo 


There  are  two  kinds  of  subdiagrams  which  are 
of  interest.  The  first,  called  the  skeleton  of  the 
derivation  diagram,  is  obtained  by  keeping  all  nodes 
and  all  direct  descendant  relations  and  erasing  all 
other  relations.  The  result  shown  in  Figure  1(c) 
nicely  illustrated  the  basic  structure  of  the  deriva- 
tion. 

The  second  subdiagram  of  interest  is  called  a 
section.  If  mj  is  a frontier  node  of  the  skeleton 
(i.e.,  has  no  descendants),  let  iJq,  . . . .mj  be  a path 
to  mj  from  the  root  node,  n0  along  only  descendant 
edges.  Let  m|,m3,  . . . ,1%  be  all  of  the  frontier 
nodes.  Then  a set  C of  nodes  of  the  derivation 
diagram  is  a crosscut  set  if  C H [n0, . . . ,mjl  is  a sing- 
leton for  all  1 < i <k.  A crosscut  set,  C,  together 


with  all  of  the  edges  of  the  derivation  diagram 
between  nodes  of  C is  called  a section.  Naturally, 
only  those  edges  are  kept  which  are  connected  to 
two  nodes  which  are  both  kept.  A section,  illus- 
trated in  Figure  1(d),  nicely  illustrates  the  basic 
structure  of  sentential  forms. 

2.  Interpretation  of  Winston’s  System 

An  example  of  the  type  of  scene  Winston's 
system  analyzes  is  shown  in  Figure  2.  The  sequence 
of  examples  Winston  found  necessary  to  train  the 
system  is  shown  in  Figure  3.  Notice  that  Winston’s 
method  uses  negative  samples  in  the  form  of  "near 
misses”  as  shown  in  scene  2 and  scene  3.  The  des- 


0) 


S 


(2) 


(3) 


S ■ start  symbol 
A - Arch 
B - block 
P - pillar 
C - crossbar 
F • front 
I « side 
T - top 

f « In  front  of 
u » under 
1 » left  of 


(c) 


Figure  2.  An  Example  of  an  Arch 


cription  that  is  finally  learned  is  shown  in  Figure  4. 
It  is  assumed  that  all  of  the  concepts  illustrated 
(except  ARCH)  have  already  been  learned.  Each 
sample  in  the  training  sequence  is  constructed  so 
that  it  has  only  one  difference  from  the  already 
learned  description.  Scene  2 illustrates  that  the 
supports  of  the  arch  must  not  abut.  Scene  3 illus- 
trates that  A must  be  supported  by  B and  C.  Scene 
4 illustrates  that  a more  general  object  than  a BRICK 
may  be  used  as  a top. 

The  description  in  Figure  4 can  be  interpreted 
as  a hierarchical  graph  model  and  as  a derivation 
diagram  of  a web  grammar.  As  such,  it  can  be 
converted  to  a web  grammar.  Some  of  the  rules  of 
this  grammar  are  shown  in  Figure  S.  These  rules 
are  created  from  Figure  S by  generating  a rule  when 
a relationship  such  as  “a- kind -of  ’ or  “one-part -is” 
is  encountered  in  the  diagram.  Thus,  the  grammar 
will  have  a derivation  diagram  similar  to  Figure  4. 
In  this  case,  the  system  is  learning  one  rule.  That 
is,  it  is  trying  to  find  the  predicate  which  describes 
the  right  side  of  rule  (1).  If  this  predicate  can  be 
learned,  it  can  then  be  used  to  analyze  higher  order 
patterns  containing  it. 

Mony  important  nonterminals  in  a web  grammar 
will  not  occur  in  recursive  rules.  These  nonterminals 
will  be  important  because  they  represent  important 


SCENE  1 
AN  ARCH 


semantic  concepts  which  give  “meaning"  to  the 
structural  descriptions.  To  learn  an  individual  rule 
in  a web  grammar,  the  system  must  be  able  to  learn 
the  most  general  description  possible  for  each  object 
most  general  description  possible  for  each  object 
on  the  right-hand  side.  Assuming  the  form  of  the 
rule  is  known  (this  is  generally  learned  from  the 
first  sample),  then  learning  the  exact  rule  becomes 
a matter  of  finding  how  much  each  object  may  be 
generalized.  In  this  case,  the  original  description 
of  ARCH  might  contain  the  objects  A,  B,  and  C; 
that  is,  an  exact  description  of  this  particular  scene. 
This  description  would  be  of  little  general  use 
because  no  slightly  different  arch  could  be  identified. 
Even  the  appropriate  parse  of  this  scene  is  not  known 
because  grammars  describing  it  might  be  ambiguous. 

In  a general  formalism  an  object  like  A is 
described  by  properties  like  orientation  and  shape. 
These  properties  allow  successive  generalization  to 
occur  according  to  what  values  of  a particular  pro- 
perty are  important.  The  structure  which  describes 
and  systematizes  the  generalization  process  is  called 
the  property  lattice. 

Definition  2: 

A set  of  elements  Olc,,c2  , . . . ] is  said  to 
be  ptwtially  ordered  (hierarchical)  if  there  exists 
a relation  (<)  defined  on  the  elements  of  C which  is: 

(1)  Reflexive:  c < c. 

(2)  Antisymmetric: 

c,  <c2  and  c2  < ct  implies  c2  = c2  ; 

(3)  Transitive: 

Cj  <c2  and  c2  <c3  implies cj  <cj. 

If  C is  a partially  ordered  set  and  X is  any 
subset  of  C,  then  aefl  is  a tower  bound  of  X if  a 
< x for  all  xeX  and  a is  an  upper  bound  of  X if  x <a 
for  all  xfX.  A lower  bound  b of  X is  called  the 


SCENE  2 
NOT  AN  ARCH 


SCENE  3 
NOT  AN  ARCH 


X A 


SCENE  4 
AN  ARCH 


Figure  4.  Dtriwlion  of  an  Arch 


(1)  (ARCH) 


(LYING  PRISM) 


(STANDItir,  BRICK)  (STAMOING  BRICK) 


WHERE  r(  * mull  be  supported  by 

r “ left  of  but  oust  not  abut 
2 


(OBJECT) 


(OBJECT) 


(OBJECT) 


(BRICK) 


(BRICK) 


(BRICK) 


(ARCH) 


Figure  S.  Same  Web  Grammar  Rules  Dee  crib  log  an  Arch 


Givon  the  Grammon: 


greatest  lower  bound  (g.l.b.)  of  X if  for  every  a that 
is  a lower  bound  of  X,  a < b.  Similarly,  an  upper 
bound  d of  X is  called  the  least  upper  bound  if  for 
every  e that  is  an  upper  bound  of  X,  d < e.  A 
partially  ordered  set  of  C in  which  any  two  elements 
have  a least  upper  bound  and  a greatest  lower  bound 
is  called  a lattice. 

In  the  case  of  concept  lea..  >ng  here,  the 
elements  of  C are  called  concepts  and  consist  of 
subsets  of  samples  containing  certain  property  values. 
The  partial  order  relation  considered  is  set  inclusion. 
The  purpose  of  the  learning  procedure  will  be  to 
find  that  concept  which  contains  all  of  the  sam- 
ples showing  allowed  property  values  and  none  of 
the  samples  having  disallowed  property  values.  The 
procedure  to  be  used  in  learning  a concept  will 
be  as  follows: 

(1)  Whenever  a set  of  positive  samples  are 
given,  then  all  lower  bounds  of  the  set 
in  the  lattice  are  allowed  as  the  possible 
concept.  The  least  upper  bound  of  the 
set  and  all  its  lower  bounds  are  also 
allowed. 

(2)  Whenever  a set  of  negative  samples  are 
given,  then  all  upper  bounds  of  the  set 
in  the  lattice  are  disallowed  as  the  possible 
concept.  The  greatest  lower  bound  of 
the  set  and  all  its  upper  bounds  are  also 
diallowed. 

(3)  Whenever  a new  positive  sample  is  given, 
then  the  new  allowed  part  of  the  lattice 
is  the  set  of  all  lower  bounds  of  the  leajt 
upper  bound  of  the  new  example  and 
the  previously  learned  least  upper  bound. 

(4)  Whenever  a new  negative  sample  is  given, 
then  the  new  disallowed  part  of  the  lattice 
is  the  set  of  all  upper  bounds  of  the 
greatest  lower  bound  of  the  new  example 
and  the  previously  learned  greatest  lower 
bound. 

(5)  When  all  of  the  points  in  the  lattice  are 
either  allowed  or  disallowed,  the  correct 
concept  is  the  least  upper  bound  of  the 
allowed  part  of  the  lattice  and  is  said 
to  have  been  learned. 

The  purpose  of  this  study  will  be  to  see  how 
the  lattice  can  help  in  selecting  a good  training  set 
and  to  see  how  grammars  can  help  in  setting  up  the 
lattice.  In  many  practical  cases,  properties  are 
neither  all  independent  nor  all  dependent.  In  these 
cases,  .he  property  lattice  is  more  nouniform.  For- 
tunately, the  property  lattice  can  be  constructed 
from  the  grammar  if  the  grammar  is  in  the  right 
form  as  is  shown  in  Figure  6.  Note  in  this  case  that 
a (STANDING  TRIANGULAR  PRISM)  is  not  allowed 
by  the  grammar  so  the  higher  order  concepts 
(STANDING)  and  (TRIANGULAR  PRISM)  are  also 
not  present.  How,  the  number  and  selection  of 
samples  necessary  to  learn  a concept  in  this  lattice 
can  be  investigated.  To  generalize  to  the  concept 
(PRISM),  2 positive  samples  (STANDING  BRICK) 


(PRISM)  — - (BRICK) 

(PRISM)  — - (LYING) 

(BRICK)  — ► (LYING  BRICK) 
(BRICK)  — - (STANDING  BRICK) 
(LYING)  — - (LYING  BRICK) 
(LYING)  (LYING  TRIA  PRISM) 


(ANY  PRISM) 


<t> 


Figure  6.  A Lattice  Constructed  From  A Grammar 

and  (LYING  TRIA  PRISM)  must  be  given.  To 
generalize  only  to  (BRICK)  or  (LYING),  all  three 
samples  (2  positive  and  1 negative)  must  be  given. 
To  generalize  to  (STANDING  BRICK)  only,  two 
samples  must  be  given. 

Thus,  by  using  the  grammaticaly  formalism 
for  lower  order  concepts,  such  as  (PRISM),  a more 
efficient  lattice  structure  can  be  set  up.  If  this 
lattice  is  big  enough,  there  is  less  necessity  for  a 
“near  miss”  to  be  so  near  because  samples  which 
are  more  different  will  still  have  a least  upper  bound 
and  greatest  lower  bound  in  the  lattice.  This  lattice 
structure  can  help  in  the  selection  of  proper  training 
samples  for  higher  order  concepts  such  as  (ARCH). 


3.  An  Inference  Procedure 

In  terms  of  formal  grammatical  inference, 

Winston’s  procedure,  as  just  formalized,  can  be 

stated  as  follows: 

(1)  Assume  that  a given  set  of  properties  and 
predicate  forms  are  known  to  be  appropriate 
from  & priori  information  about  the  applica- 
tion. 

(2)  Given  a sample,  get  all  possible  parses  of  it 
with  these  forms  and  arrange  the  parse  non- 
terminals in  a property  lattice. 

(3)  Then,  by  giving  a sequence  of  appropriate 
positive  and  negative  samples,  and  using  least 
upper  and  greatest  lower  bound  operations 
in  the  lattice,  converge  to  the  correct  parse 
common  to  all  positive  samples  and  including 
no  negative  samples. 

(4)  Construct  the  grammar  rule  reflecting  this  parse. 
An  example  of  applying  this  procedure  to  a 

Winston-like  problem  is  now  given. 


Example  2: 

Assume  we  are  given  a problem  in  which  the 
only  objects  are  rectangular  prisms  and  the  only  pro- 
perties detectable  are  size,  shape,  and  color.  Fur- 
thermore, assume  that  green  cubes  do  not  exist.  A 
lattice  illustrating  these  properties  is  shown  in 
Figure  7.  The  objects,  properties,  and  relations  are 
summarized  below  in  Table  1. 


Tabla  1 Objects.  Properties,  end  Retetkme  for  Example  2 


Object 

Properties 

Values 

Relations 

Rectangular 

Size 

Larger 

Supported  by 

Mims 

Smaller 

Larger-smaller 

Shape 

Color 

Cube 

Rectangular 

Prism 

Red 

Green 

Same  color 

We  now  wish  to  learn  the  concept  of  a pyramid. 
For  illustrative  purposes  it  is  assumed  that  a legal 
pyramid  can  have  cubes  or  rectangular  prisms  but 
supporting  objects  can  only  be  red  in  color.  That 
is  only  the  top  object  can  be  green.  To  being,  a 
positive  sample  of  a pyramid  (shown  in  Figure  8) 
is  presented  and  the  pattern  is  parsed.  The  parse  or 
derivation  diagram  or  semantic  net  resulting  is  shown 
in  Figure  9. 

Now,  by  presenting  an  appropriate  sequcene 

O' Ject  Lattice 
P - (PI.  f>2) 

PI  - COLOR,  0 - RED,  I • GREEN 
P2  - SHAPE,  0 - CUBE,  I - RECTANGLE 


(on.ol.n) 


Plpn  7.  Lattice  for  tasaflM  a*  Ixampla  2 


Figure  •.  An  Example  of  a Pyramid 


Figure  B.  Pane  of  Figure  8 


of  positive  and  negative  samp'es,  th;  teacher  must 
illustrate  the  most  general  object  or  relation  which 
is  allowed  in  each  position.  This  example  will  con- 
centrate just  on  the  objects  and  for  the  moment 
ignore  the  fact  that  the  relations  must  be  learned 
also.  The  supporting  objects  in  the  pyramid  can  be 
any  shape  but  must  be  red.  This  is  illustrated  by 
the  (00,01)  entry  in  the  lattice.  This  can  be 
illustrated  by  three  samples:  00  and  01  as  poutive 
samples  and  11  as  a negative  srmpb  The  top 
object  can  be  green.  Since  a cube  cannot  be  green, 
this  is  illustrated  by  the  (00,  01,  11)  entry  in  the 
lattice.  This  state  in  the  lattice  can  be  learned 
by  presenting  00,  01 , and  11  as  positive  samples. 
Thus,  for  each  individual  object,  three  samples  must 
be  given.  But,  since  these  can  occur  in  various 
combinations  with  the  other  objects,  a total  of  27 
combinations  must  be  presented  to  completely  learn 
the  definition  of  the  pyramid.  The  samples  are 
shown  in  Figure  10.  Note  that  if  the  objects  can 
be  considered  independent  only  seven  samples  need 
to  be  given.  These  sample,  are  shown  with  asterisk 
in  Figure  10. 

The  derivation  diagram  which  is  finally  learned 
is  shown  in  Figure  1 1 . The  grammar  rule  learned  is 
extracted  from  this  diagram  by  putting  the  ancestor 
of  the  “One-part-is”  relation  on  the  left-hand  side 
and  the  descendants  op  the  right-hand  side.  This 
rule  is  shown  in  Figure  12.  The  embedding  of  this 
rule  is  somewhat  arbitrary. 

Several  conclusions  can  be  drawn  from  this 
example.  First,  if  there  are  several  properties  involved 
and  these  properties  take  on  several  values  and  it  is 
necessary  to  learn  a pattern  containing  several  objects, 
then  many  samples  must  be  used  in  training  unless 

some  heuristic  assumption  is  made.  Second,  if  one 
part  of  the  pattern  can  be  assumed  lndepedent  of 


Simple  Code  for  C 00 
Coda  for  > ■ 00 

Cod*  for  A 00 
Sampler  which  mm  be  pr mated: 
fedtivt  Samplet: 

00*  00*  00*  00  01*  01  01  01 

00  00  01  01  00  00  01  01 

00  01  00  01  00  01  00  01 

11*  11  II  11 

00  00  01  01 

00  01  00  01 

Negative  Sample*: 

00*  00  00*  00  00  01  01  01  01  01 

00  01  11  11  II  00  01  11  11  11 

11  11  00  01  11  11  II  00  01  It 

II  11  11  11  11 

00  01  11  II  11 

11  11  00  01  11 

Figure  to  Train  lag  femptae 


PM  RP  RP  P 

* ~ ‘ 5 ' 8 

Figure  12.  The  Rewiring  Qrammar  Rule 

other  parts,  the  number  of  samples  needed  to  learn 
it  can  be  greatly  reduced.  Third,  this  method  a* 
shown  does  not  specify  the  embedding. 

CONCLUSIONS  AND  REMARKS 

This  paper  presents  some  preliminary  results  in 
the  inference  of  tree  and  web  grammars.  It  is  hoped 
that  the  preliminary  results  will  stimulate  new  and 
better  inference  methods  for  high-dimensional 
grammars,  particularly  concerning  the  quality  of 


inference  (or  the  “goodness  of  fit”)  and  the  applica- 
bility to  real-world  problems.  A proposal  for  inferring 
web  grammar  from  pictorial  patterns  can  be  found 
in  (10]. 
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